Search CORE

55 research outputs found

Efficient Discovery of Ontology Functional Dependencies

Author: Baskaran Sridevi
Chiang Fei
Keller Alexander
Lukasz Golab
Szlichta Jaroslaw
Publication venue
Publication date: 23/05/2017
Field of study

Poor data quality has become a pervasive issue due to the increasing complexity and size of modern datasets. Constraint based data cleaning techniques rely on integrity constraints as a benchmark to identify and correct errors. Data values that do not satisfy the given set of constraints are flagged as dirty, and data updates are made to re-align the data and the constraints. However, many errors often require user input to resolve due to domain expertise defining specific terminology and relationships. For example, in pharmaceuticals, 'Advil' \emph{is-a} brand name for 'ibuprofen' that can be captured in a pharmaceutical ontology. While functional dependencies (FDs) have traditionally been used in existing data cleaning solutions to model syntactic equivalence, they are not able to model broader relationships (e.g., is-a) defined by an ontology. In this paper, we take a first step towards extending the set of data quality constraints used in data cleaning by defining and discovering \emph{Ontology Functional Dependencies} (OFDs). We lay out theoretical and practical foundations for OFDs, including a set of sound and complete axioms, and a linear inference procedure. We then develop effective algorithms for discovering OFDs, and a set of optimizations that efficiently prune the search space. Our experimental evaluation using real data show the scalability and accuracy of our algorithms.Comment: 12 page

arXiv.org e-Print Archive

Crossref

Profiling relational data: a survey

Author: Abedjan Ziawasch
Golab Lukasz
Naumann Felix
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/08/2016
Field of study

Profiling data to determine metadata about a given dataset is an important and frequent activity of any IT professional and researcher and is necessary for various use-cases. It encompasses a vast array of methods to examine datasets and produce metadata. Among the simpler results are statistics, such as the number of null values and distinct values in a column, its data type, or the most frequent patterns of its data values. Metadata that are more difficult to compute involve multiple columns, namely correlations, unique column combinations, functional dependencies, and inclusion dependencies. Further techniques detect conditional properties of the dataset at hand. This survey provides a classification of data profiling tasks and comprehensively reviews the state of the art for each class. In addition, we review data profiling tools and systems from research and industry. We conclude with an outlook on the future of data profiling beyond traditional profiling tasks and beyond relational databases

DSpace@MIT

TimeFabric: Trusted Time for Permissioned Blockchains

Author: Golab Lukasz
Gorenflo Christian
Keshav S.
Mitra Aritra
Publication venue: OASIcs - OpenAccess Series in Informatics. 4th International Symposium on Foundations and Applications of Blockchain 2021 (FAB 2021)
Publication date: 01/01/2021
Field of study

As the popularity of blockchains continues to rise, blockchain platforms must be enhanced to support new application needs. In this paper, we propose one such enhancement that is essential for financial applications and online marketplaces - support for time-based logic such as verifying deadlines or expiry dates and examining a time window of recent account activity. We present a lightweight solution to reach consensus on the current time without relying on external time oracles. Our solution assigns timestamps to blocks at transaction validation time and maintains a cache reflecting the effects of recent transactions. We implement a proof-of-concept prototype, called TimeFabric, in Hyperledger Fabric, a popular permissioned blockchain platform, and experimentally demonstrate high throughput and minimal overhead (approximately 3%) of maintaining trusted time. We also demonstrate a 2x performance improvement due to the cache, compared to reconstructing account histories from the ledger

Dagstuhl Research Online Publication Server

Multi-Modal Discussion Transformer: Integrating Text, Images and Graph Transformers to Detect Hate Speech on Social Media

Author: Cohen Robin
Golab Lukasz
Hebert Liam
Sahu Gaurav
Sreenivas Nanda Kishore
Publication venue
Publication date: 18/07/2023
Field of study

We present the Multi-Modal Discussion Transformer (mDT), a novel multi-modal graph-based transformer model for detecting hate speech in online social networks. In contrast to traditional text-only methods, our approach to labelling a comment as hate speech centers around the holistic analysis of text and images. This is done by leveraging graph transformers to capture the contextual relationships in the entire discussion that surrounds a comment, with interwoven fusion layers to combine text and image embeddings instead of processing different modalities separately. We compare the performance of our model to baselines that only process text; we also conduct extensive ablation studies. We conclude with future work for multimodal solutions to deliver social value in online contexts, arguing that capturing a holistic view of a conversation greatly advances the effort to detect anti-social behavior.Comment: Under Submissio

arXiv.org e-Print Archive

Statins Impair Antitumor Effects of Rituximab by Inducing Conformational Changes of CD20

Author: Basak Grzegorz W
Bil Jacek
Bojarski Lukasz
Dabrowska-Iwanicka Anna
Engelberts Patrick J
Gaciong Zbigniew
Glodkowska Eliza
Golab Jakub
Gorska Elzbieta
Issat Tadeusz
Jakobisiak Marek
Kurzaj Zuzanna
Lasek Witold
Lekka Malgorzata
Mackus Wendy J. M
Makowski Marcin
Mrowka Piotr
Nowis Dominika
Parren Paul W. H. I
Sinski Maciej
Stoklosa Tomasz
Warzocha Krzysztof
Wasik Maria
Wilczek Ewa
Wilczynski Grzegorz M
Winiarska Magdalena
Publication venue: Public Library of Science
Publication date: 01/03/2008
Field of study

Jakub Golab and colleagues found that statins significantly decrease rituximab-mediated complement-dependent cytotoxicity and antibody-dependent cellular cytotoxicity against B cell lymphoma cells

Directory of Open Access Journals

PubMed Central